Rank in Wordlist | Word | Rank in Wordlist | Word |
---|---|---|---|
1 | 、 | 26 | 日 |
2 | の | 27 | ある |
3 | 。 | 28 | ・ |
4 | を | 29 | 1 |
5 | に | 30 | できる |
6 | は | 31 | 年 |
7 | が | 32 | ます |
8 | た | 33 | 2 |
9 | で | 34 | 月 |
10 | て | 35 | よう |
11 | と | 36 | という |
12 | し | 37 | として |
13 | する | 38 | れる |
14 | も | 39 | この |
15 | いる | 40 | か |
16 | さ | 41 | なる |
17 | な | 42 | なっ |
18 | こと | 43 | 者 |
19 | れ | 44 | 人 |
20 | から | 45 | 3 |
21 | や | 46 | 的 |
22 | ない | 47 | 円 |
23 | など | 48 | 機能 |
24 | だ | 49 | です |
25 | い | 50 | 万 |
The table shows the top-50 words of the corpus. Usually we see stopwords.
Language: Afrikaans
This list is a good candidate for a first stopword list for a language.
Usually a small, balanced corpus is enough to get a good list of high frequent words. But if the small corpus has some very prominent topic, this will be visible even in the top word lists.
select w_id-100 as rank_in_wordlist, word from words where w_id>100 order by w_id limit 50;
3.4 Sample words for different frequency ranges